Backend of Programming Languages

Author

Ken Pu

1 Turing Completeness

1.1 TM as the limit of computation

Theorem (Church Turing Thesis)

Every algorithm has a TM that implements it.

Definition: Turing Completeness

A programming language is Turing-complete if it is equality expressive as Turing machines.

1.2 Elements of a Turing-complete language

  • Data representation
  • Array-like memory storage
  • Loop
  • Branch control

Java is a Turing complete language.

int[] numbers = new int[]{1, 2, 3, 4, 5};

while(...) {
    ...
}

if(...) {
    ...
} else {
    ...
}

1.3 Other Turing-complete languages

1.3.1 Python

numbers = [1,2,3,4]
while(...):
    ...
    
if ...:
    ...
else:
    ...

1.3.2 Clojure

(let [numbers [1 2 3 4]]
    ...)

(loop ... (recur ...))
(if ...)

1.3.3 SQL

SQL is also Turing-complete using recursive queries.

2 Minimal And Efficient Backend Language

2.1 Why not Turing Machine

  • TM is too inefficient.
  • TM lacks strong types.
  • TM does not support modern processor’s optimization features.

2.2 Java Bytecode

A simple instruction based runtime environment consisting of:

  1. An operand stack (of 32-bit or 64-bit cells)
  2. Local variable array
  3. A heap for dynamic memory allocation
  4. Constant pool for defining constants

Operand stack with three values

 | <x> |
 | <y> |
 | <z> |
 +-----+

Each x, y or z are either:

  • integer, float, hot-spot reference (32b)
  • long, double, references (64b)
 | <x> |          
 | <y> |   --->   | <w> |
 | <z> |   iadd   | <z> |
 +-----+          +-----+

iadd: add the two integers from the top of the stack, and place the result back onto the stack.

2.3 From Java to Bytecode

A minimal Java program.

public class Hello {
    public static void main(String[] args) {
        System.out.println("Hello World");
    }
}

In JVM bytecode, this is converted into much simpler (but more verbose) list of instructions.

.class public Hello
.super java/lang/Object

Declaration of a Java class named Hello.

.method public <init>()V
    aload_0
    invokenonvirtual java/lang/Object/<init>()V
    return
.end method

Declares a constructor which invokes the superclass constructor.

.method public static main([Ljava/lang/String;)V
    .limit stack 2
    
    getstatic java/lang/System/out Ljava/io/PrintStream;
    ldc "Hello world"
    invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
    return
.end method

The public static void main(String[] args) method that performs stack based computation.

  • allocate a stack of maximum depth 2
  • places an object reference on the stack (System.out). It also specifies its Java type java.io.PrintStream.
  • places a constant string on top of the stack.
  • invokes a method println(...) which consumes two from the stack: the string and the PrintStream.

2.4 Some examples of Bytecode programming

2.4.1 Adding numbers

Note

For example, we omit the class declaration and constructor implmentation in bytecode. Below is the main method implementation.

.method public static main([Ljava/lang/String;)V
    .limit stack 100
    .limit locals 2
    
    ldc 200
    ldc 300
    iadd
    istore_1
    
    getstatic java/lang/System/out Ljava/io/PrintStream;
    iload_1
    invokevirtual java/io/PrintStream/println(I)V
    return
.end method

Equivalent to the code:

x = 200 + 300
print(x)

2.4.2 Adding numbers using a loop

Note

For this example, we are only going to show the body of the main method. The complete bytecode would require the class declaration, constructor implementation, and the main method declaration code. (See above).

.limit stack 100
.limit locals 2

    
    

We will have a larger stack in case we need more space for computation. We will have two local variables.

  1. The this variable that refers to the runtime object – this is always needed.
  2. The accumulator integer.
; #1 is the total sum
iconst_0
istore_1

This initializes the local variable using the stack.

  • Place 0 on top of the stack
  • Move the top of the stack to variable #1
; load a counter value
ldc 1000

Store 1000 to the top of the stack. This will be the count-down value used during the loop.

Important: We will keep the count-down value in the stack until the end of the loop.

LOOP:

This is a label that marks the start of the loop.

; break the loop if reach zero
dup
ifeq END_LOOP

Test the top of the stack to be zero. If it is zero, then break out of the loop by branching to END_LOOP label.

Important: we duplicate the top of the stack so that that test ifeq does not remove the original count-down value.

; add the counter value to sum
dup
iload_1
iadd
istore_1

Add the count-down value (a copy) to the local variable #1, and then store the result back to the local variable #1.

; decrement counter
iconst_1
isub

Modify the count-down value on the stack by decrementing it by 1.

goto LOOP
END_LOOP:

End of the loop

; print the total
iload_1
getstatic java/lang/System/out Ljava/io/PrintStream;
iload_1
invokevirtual java/io/PrintStream/println(I)V
return

After the loop, we print the value of the local variable #1.

2.5 Towards a better programming interface

We need a programming language for:

  1. Semantics: Better abstraction of a runtime environment.
  2. Syntax: Better symbolic description of the computing process.
  3. Verification: Safer communication to minimize human errors.
  4. Compilation: Generate code into Java Bytecode for fast execution.